{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [],
   "source": [
    "import networkx as nx\n",
    "import matplotlib.pyplot as plt\n",
    "from collections import Counter\n",
    "from custom import load_data as cf\n",
    "from custom import ecdf\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "from nxviz import CircosPlot\n",
    "import numpy as np\n",
    "\n",
    "\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Load Data\n",
    "\n",
    "We will load the [sociopatterns network]() data for this notebook. From the Konect website:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# Load the sociopatterns network data. \n",
    "G = cf.load_sociopatterns_network()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# How many nodes and edges are present?\n",
    "len(G.nodes()), len(G.edges())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Hubs: How do we evaluate the importance of some individuals in a network?\n",
    "\n",
    "Within a social network, there will be certain individuals which perform certain important functions. For example, there may be hyper-connected individuals who are connected to many, many more people. They would be of use in the spreading of information. Alternatively, if this were a disease contact network, identifying them would be useful in stopping the spread of diseases. How would one identify these people?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Approach 1: Neighbors\n",
    "\n",
    "One way we could compute this is to find out the number of people an individual is conencted to. NetworkX let's us do this by giving us a `G.neighbors(node)` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# Let's find out the number of neighbors that individual #7 has.\n",
    "# list(G.neighbors(7))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**API Note:** As of NetworkX 2.0, `G.neighbors(node)` now returns a `dict_keyiterator`, which means we have to cast them as a `list` first in order to compute its length."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Exercise\n",
    "\n",
    "Can you create a ranked list of the importance of each individual, based on the number of neighbors they have? (3 min.)\n",
    "\n",
    "Hint: One suggested output would be a list of tuples, where the first element in each tuple is the node ID (an integer number), and the second element is the number of neighbors that it has.\n",
    "\n",
    "Hint: Python's `sorted(iterable, key=lambda x:...., reverse=True)` function may be of help here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An alternative:\n",
    "\n",
    "```python\n",
    "sorted([(n, G.neighbors(n)) for n in G.nodes()], \n",
    "       key=lambda x: len(x[1]), reverse=True)[0:5]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An alternative from [**@dgerlanc**](https://github.com/dgerlanc) using generator expressions to save on memory materialization cost & time:\n",
    "\n",
    "```python\n",
    "gen = ((len(list(G.neighbors(x))), x) for x in G.nodes())\n",
    "sorted(gen, reverse=True)\n",
    "```\n",
    "\n",
    "This answer was raised on [GitHub](https://github.com/ericmjl/Network-Analysis-Made-Simple/issues/75)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Approach 2: Degree Centrality\n",
    "\n",
    "The number of other nodes that one node is connected to is a measure of its centrality. NetworkX implements a **degree centrality**, which is defined as the number of neighbors that a node has normalized to the number of individuals it could be connected to in the entire graph. This is accessed by using `nx.degree_centrality(G)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# nx.degree_centrality(G)\n",
    "\n",
    "# Uncomment the next line to show a truncated version.\n",
    "list(nx.degree_centrality(G).items())[0:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "If you inspect the dictionary closely, you will find that node 51 is the one that has the highest degree centrality, just as we had measured by counting the number of neighbors.\n",
    "\n",
    "There are other measures of centrality, namely **betweenness centrality**, **flow centrality** and **load centrality**. You can take a look at their definitions on the NetworkX API docs and their cited references. You can also define your own measures if those don't fit your needs, but that is an advanced topic that won't be dealt with here.\n",
    "\n",
    "The NetworkX API docs that document the centrality measures are here: http://networkx.readthedocs.io/en/networkx-1.11/reference/algorithms.centrality.html?highlight=centrality#module-networkx.algorithms.centrality"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Exercises\n",
    "\n",
    "The following exercises are designed to get you familiar with the concept of \"distribution of metrics\" on a graph.\n",
    "\n",
    "1. Can you create an ECDF of the distribution of degree centralities?\n",
    "2. Can you create an ECDF of the distribution of number of neighbors?\n",
    "3. Can you create a scatterplot of the degree centralities against number of neighbors?\n",
    "4. If I have `n` nodes, then how many possible edges are there in total, assuming self-edges are allowed? What if self-edges are not allowed?\n",
    "\n",
    "Exercise Time: 8 minutes.\n",
    "\n",
    "Here is what an ECDF is (https://en.wikipedia.org/wiki/Empirical_distribution_function)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Hint: You may want to use:\n",
    "\n",
    "    ecdf(list_of_values)\n",
    "    \n",
    "to get the empirical CDF x- and y-values for plotting, and \n",
    "\n",
    "    plt.scatter(x_values, y_values)\n",
    "    \n",
    "Hint: You can access the dictionary `.keys()` and `.values()` and cast them as a list.\n",
    "\n",
    "If you know the Matplotlib API, feel free to get fancy :)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Possible Answers:\n",
    "fig = plt.figure(0)\n",
    "# Get a list of degree centrality scores for all of the \n",
    "# nodes in the graph\n",
    "degree_centralities = list(\n",
    "    nx.degree_centrality(G).values())\n",
    "x, y = ecdf(degree_centralities)\n",
    "# Plot the histogram of degree centralities.\n",
    "plt.scatter(x, y)\n",
    "# Set the plot title. \n",
    "plt.title('Degree Centralities')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "fig = plt.figure(1)\n",
    "neighbors = [len(list(G.neighbors(node))) for node in G.nodes()]\n",
    "x, y = ecdf(neighbors)\n",
    "plt.scatter(x, y)\n",
    "plt.title('Number of Neighbors')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "fig = plt.figure(2)\n",
    "plt.scatter(degree_centralities, neighbors, alpha=0.1)\n",
    "plt.xlabel('Degree Centralities')\n",
    "plt.ylabel('Number of Neighbors')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Exercise\n",
    "\n",
    "Before we move on to paths in a network, see if you can use the Circos plot to visualize the network. Order and color the nodes according to the `order` keyword. (2 min.)\n",
    "\n",
    "The CircosPlot API needs documentation written; for now, I am providing the following \"on-the-spot\" docs for you."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To instantiate and draw a CircosPlot:\n",
    "\n",
    "```python\n",
    "c = CircosPlot(G, node_order='node_key', node_color='node_key')\n",
    "c.draw()\n",
    "plt.show()  # or plt.savefig(...)\n",
    "```\n",
    "\n",
    "Notes:\n",
    "\n",
    "- `'node_key'` is a key in the node metadata dictionary that the CircosPlot constructor uses for determining the colour, grouping, and ordering of the nodes.\n",
    "- In the following exercise, you may want to use `order`, which is already encoded on each node in the graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "c = CircosPlot(G, node_order='order', node_color='order')\n",
    "c.draw()\n",
    "plt.savefig('images/sociopatterns.png', dpi=300)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "What can you deduce about the structure of the network, based on this visualization?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Nodes are sorted by ID. Nodes are more connected to proximal rather than distal nodes. The data are based on people streaming through an enclosed space, so it makes sense that people are mostly connected to others proximal in order, but occasionally some oddballs stick around."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Paths in a Network\n",
    "\n",
    "Graph traversal is akin to walking along the graph, node by node, restricted by the edges that connect the nodes. Graph traversal is particularly useful for understanding the local structure (e.g. connectivity, retrieving the exact relationships) of certain portions of the graph and for finding paths that connect two nodes in the network. \n",
    "\n",
    "Using the synthetic social network, we will figure out how to answer the following questions:\n",
    "\n",
    "1. How long will it take for a message to spread through this group of friends? (making some assumptions, of course)\n",
    "2. How do we find the shortest path to get from individual A to individual B?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Shortest Path\n",
    "\n",
    "Let's say we wanted to find the shortest path between two nodes. How would we approach this? One approach is what one would call a **breadth-first search** (http://en.wikipedia.org/wiki/Breadth-first_search). While not necessarily the fastest, it is the easiest to conceptualize. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The approach is essentially as such:\n",
    "\n",
    "1. Begin with a queue of the starting node.\n",
    "2. Add the neighbors of that node to the queue.\n",
    "    1. If destination node is present in the queue, end.\n",
    "    2. If destination node is not present, proceed.\n",
    "3. For each node in the queue:\n",
    "    1. Remove node from the queue.\n",
    "    2. Add neighbors of the node to the queue. Check if destination node is present or not.\n",
    "    3. If destination node is present, end. <!--Credit: @cavaunpeu for finding bug in pseudocode.-->\n",
    "    4. If destination node is not present, continue."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Exercise\n",
    "\n",
    "Try implementing this algorithm in a function called `path_exists(node1, node2, G)`. (15 min.)\n",
    "\n",
    "The function should take in two nodes, `node1` and `node2`, and the graph `G` that they belong to, and return a Boolean that indicates whether a path exists between those two nodes or not. For convenience, also print out whether a path exists or not between the two nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Test your answer below\n",
    "def test_path_exists():\n",
    "    assert path_exists(18, 10, G)\n",
    "    assert path_exists(22, 51, G)\n",
    "    \n",
    "test_path_exists()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test_path_does_not_exist(G):\n",
    "    g = G.copy()  # so that we do not mutate original graph.\n",
    "    g.add_node(100000)\n",
    "    assert not path_exists(18, 100000, g)\n",
    "    \n",
    "test_path_does_not_exist(G)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "If you write an algorithm that runs breadth-first, the recursion pattern is likely to follow what we have done above. If you do a depth-first search (i.e. DFS), the recursion pattern is likely to look a bit different. Take it as a challenge exercise to figure out how a DFS looks like.\n",
    "\n",
    "Meanwhile... thankfully, NetworkX has a function for us to use, titled `has_path`, so we don't have to implement this on our own. :-) Check it out [here](https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.shortest_paths.generic.shortest_path.html#networkx.algorithms.shortest_paths.generic.shortest_path)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "nx.has_path(G, 400, 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "NetworkX also has [other shortest](https://networkx.github.io/documentation/stable/reference/algorithms/shortest_paths.html) path algorithms implemented. \n",
    "\n",
    "We can build upon these to build our own graph query functions. Let's see if we can trace the shortest path from one node to another."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "`nx.shortest_path(G, source, target)` gives us a list of nodes that exist within one of the shortest paths between the two nodes. (Not all paths are guaranteed to be found.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "nx.shortest_path(G, 4, 400)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Incidentally, the node list is in order as well."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Exercise\n",
    "\n",
    "Write a function that extracts the edges in the shortest path between two nodes and puts them into a new graph, and draws it to the screen. It should also return an error if there is no path between the two nodes. (5 min.)\n",
    "\n",
    "Hint: You may want to use `G.subgraph(iterable_of_nodes)` to extract just the nodes and edges of interest from the graph `G`. You might want to use the following lines of code somewhere:\n",
    "\n",
    "    newG = G.subgraph(nodes_of_interest)\n",
    "    nx.draw(newG)\n",
    "    \n",
    "newG will be comprised of the nodes of interest and the edges that connect them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Possible Answer:\n",
    "\n",
    "def extract_path_edges(G, source, target):\n",
    "    # Check to make sure that a path does exists between source and target.\n",
    "    if nx.has_path(G, source, target):\n",
    "        nodes = nx.shortest_path(G, source, target)\n",
    "        newG = G.subgraph(nodes)\n",
    "        return newG\n",
    "\n",
    "    else:\n",
    "        raise Exception('Path does not exist between nodes {0} and {1}.'.format(source, target))\n",
    "        \n",
    "newG = extract_path_edges(G, 4, 400)\n",
    "nx.draw(newG, with_labels=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Challenge Exercise (at home)\n",
    "\n",
    "These exercises below are designed to let you become more familiar with manipulating and visualizing subsets of a graph's nodes.\n",
    "\n",
    "Write a function that extracts only node, its neighbors, and the edges between that node and its neighbors as a new graph. Then, draw the new graph to screen."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "# Possible Answer\n",
    "\n",
    "def extract_neighbor_edges(G, node):\n",
    "    neighbors = list(G.neighbors(node))\n",
    "    newG = nx.Graph()\n",
    "    \n",
    "    for n1, n2 in G.edges():\n",
    "        if (n1 == node and n2 in neighbors) or (n1 in neighbors and n2 == node):\n",
    "            newG.add_edge(n1, n2)\n",
    "            \n",
    "    return newG\n",
    "\n",
    "fig = plt.figure(0)\n",
    "newG = extract_neighbor_edges(G, 23)\n",
    "nx.draw(newG, with_labels=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "def extract_neighbor_edges2(G, node):\n",
    "    neighbors = G.neighbors(node)\n",
    "    newG = nx.Graph()\n",
    "    \n",
    "    for neighbor in neighbors:\n",
    "        if (node, neighbor) in G.edges() or (neighbor, node) in G.edges():\n",
    "            newG.add_edge(node, neighbor)\n",
    "\n",
    "    return newG\n",
    "\n",
    "fig = plt.figure(1)\n",
    "newG = extract_neighbor_edges2(G, 19)\n",
    "nx.draw(newG, with_labels=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### Challenge Exercises (at home)\n",
    "\n",
    "Let's try some other problems that build on the NetworkX API. Refer to the following for the relevant functions:\n",
    "\n",
    "http://networkx.readthedocs.io/en/networkx-1.11/reference/algorithms.shortest_paths.html\n",
    "\n",
    "1. If we want a message to go from one person to another person, and we assume that the message takes 1 day for the initial step and 1 additional day per step in the transmission chain (i.e. the first step takes 1 day, the second step takes 2 days etc.), how long will the message take to spread from any two given individuals? Write a function to compute this.\n",
    "2. What is the distribution of message spread times from person to person? What about chain lengths?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "hide_input": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "# Possible answer to Question 1:\n",
    "# All we need here is the length of the path.\n",
    "\n",
    "\n",
    "def compute_transmission_time(G, source, target):\n",
    "    \"\"\"\n",
    "    Fill in code below.\n",
    "    \"\"\"\n",
    "    length = nx.shortest_path_length(G, source, target)\n",
    "    \n",
    "    time = sum(range(1, length+1))\n",
    "    \n",
    "    return time\n",
    "\n",
    "\n",
    "compute_transmission_time(G, 14, 4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "# Possible answer to Question 2:\n",
    "# We need to know the length of every single shortest path between every pair of nodes.\n",
    "# If we don't put a source and target into the nx.shortest_path_length(G) function call, then\n",
    "# we get a dictionary of dictionaries, where all source-->target-->lengths are shown.\n",
    "\n",
    "lengths = []\n",
    "times = []\n",
    "for source, sink_length in nx.shortest_path_length(G):\n",
    "    for sink, length in sink_length.items():\n",
    "        times.append(sum(range(1, length+1)))\n",
    "        lengths.append(length)\n",
    "        \n",
    "plt.figure(0)\n",
    "plt.bar(list(Counter(lengths).keys()), list(Counter(lengths).values()))\n",
    "\n",
    "plt.figure(1)\n",
    "plt.bar(list(Counter(times).keys()), list(Counter(times).values()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Hubs Revisited\n",
    "\n",
    "If a message has to be passed through the network in the shortest time possible, there may be \"bottleneck\" nodes through which information must always flow through. Such a node has a high **betweenness centrality**. This is implemented as one of NetworkX's centrality algorithms. Check out the Wikipedia page for a further description.\n",
    "\n",
    "http://en.wikipedia.org/wiki/Betweenness_centrality"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "btws = nx.betweenness_centrality(G, normalized=False)\n",
    "plt.bar(list(btws.keys()), list(btws.values()))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "## Exercise\n",
    "\n",
    "Plot betweeness centrality against degree centrality for the network data. (5 min.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dc = pd.Series(nx.degree_centrality(G))\n",
    "bc = pd.Series(nx.betweenness_centrality(G))\n",
    "\n",
    "df = pd.DataFrame({'dc': dc, 'bc': bc})\n",
    "df.plot(kind='scatter', x='dc', y='bc')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Possible answer:\n",
    "deg_centrality = nx.degree_centrality(G)\n",
    "btw_centrality = nx.betweenness_centrality(G)\n",
    "\n",
    "deg_cent_sorted = [i[1] for i in sorted(zip(deg_centrality.keys(), deg_centrality.values()))]\n",
    "btw_cent_sorted = [i[1] for i in sorted(zip(btw_centrality.keys(), btw_centrality.values()))]\n",
    "\n",
    "plt.scatter(deg_cent_sorted, btw_cent_sorted)\n",
    "plt.xlabel('degree')\n",
    "plt.ylabel('betweeness')\n",
    "plt.title('centrality scatterplot')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**Think about it...**\n",
    "\n",
    "From the scatter plot, we can see that the dots don't all fall on the same line. Degree centrality and betweenness centrality don't necessarily correlate. Can you think of scenarios where this is true?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "What would be the degree centrality and betweenness centrality of the middle connecting node in the **barbell graph** below?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "nx.draw(nx.barbell_graph(5, 1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "nams",
   "language": "python",
   "name": "nams"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "navigate_num": "#000000",
    "navigate_text": "#333333",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700",
    "sidebar_border": "#EEEEEE",
    "wrapper_background": "#FFFFFF"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "297px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": true,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": false,
   "toc_section_display": "block",
   "toc_window_display": true,
   "widenNotebook": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
